Estimation of Secure Data Deduplication in Big Data

نویسنده

  • Naresh Kumar
چکیده

Bigdata is linked with the entireties of composite data sets. In bigdata environment, data is in the form of unstructured data and may contain number of duplicate copies of same data. To manage such a complex unstructured data hadoop is to be used. A hadoop is an open source platform specially designed for bigdata environment. Hadoop can handle unstructured data very efficiently as compare to tradition data processing tools. To reduce duplicity of data concept of deduplication is used. In this paper an evaluation of different chunking and deduplication techniques has been presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Review Paper on Hybrid Cloud Approach for Secure Authorized Data Deduplication

Cloud computing is best concept to handle big database as the world is moving towards digitization. The amount of digital data in the world is growing exponentially with time. Thus, employing storage optimization techniques is an essential requirement to large storage areas like cloud storage. Cloud computing is best concept to handle big datasets. Data de the best storage optimization techniqu...

متن کامل

A Dynamic Deduplication Approach for Big Data Storage

As data is increasing every day, so it is very challenging task to manage storage devices for this explosive growth of digital data. Data reduction has become very crucial problem. Deduplication approach plays a vital role to remove redundancy in large scale cluster computing storage. As a result, deduplication provides better storage utilization by eliminating redundant copies of data and savi...

متن کامل

Guest Editorial on Advances in Tools and Techniques for Enabling Cyber-Physical-Social Systems - Part II

P ART II of the IEEE Transactions on Computational Social Systems Special Issue on Cyber–Physical–Social Systems (CPSS) includes six papers that are on emerging techniques for radio access networks, data deduplication, big data computing, smart community, cloud computing, and Internet of Things. The paper “QoE-Guaranteed and Power-Efficient Network Operation for Cloud Radio Access Network with ...

متن کامل

Ddup - towards a deduplication framework utilising apache spark

This paper is about a new framework called DeduPlication (DduP). DduP aims to solve large scale deduplication problems on arbitrary data tuples. DduP tries to bridge the gap between big data, high performance and duplicate detection. At the moment a first prototype exists but the overall project status is work in progress. DduP utilises the promising successor of Apache Hadoop MapReduce [Had14]...

متن کامل

Boafft: Distributed Deduplication for Big Data Storage in the Cloud

As data progressively grows within data centers, the cloud storage systems continuously facechallenges in saving storage capacity and providing capabilities necessary to move big data within an acceptable time frame. In this paper, we present the Boafft, a cloud storage system with distributed deduplication. The Boafft achieves scalable throughput and capacity usingmultiple data servers to dedu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017